Search CORE

332 research outputs found

Large-Scale Plant Classification with Deep Neural Networks

Author: Alexis Joly
Frédéric
James
Krizhevsky Alex
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

This paper discusses the potential of applying deep learning techniques for plant classification and its usage for citizen science in large-scale biodiversity monitoring. We show that plant classification using near state-of-the-art convolutional network architectures like ResNet50 achieves significant improvements in accuracy compared to the most widespread plant classification application in test sets composed of thousands of different species labels. We find that the predictions can be confidently used as a baseline classification in citizen science communities like iNaturalist (or its Spanish fork, Natusfera) which in turn can share their data with biodiversity portals like GBIF.Comment: 5 pages, 3 figures, 1 table. Published at Proocedings of ACM Computing Frontiers Conference 201

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Instance-based Bird Species Identification with Undiscriminant Features Pruning

Author: Buisson Olivier
Champ Julien
Joly Alexis
Publication venue: Springer International Publishing
Publication date: 15/09/2014
Field of study

International audienceThis paper reports the participation of Inria to the audiobasedbird species identication challenge of LifeCLEF 2014 campaign.Inspired by recent works on ne-grained image classication, we introducean instance-based classication scheme based on the dense indexingof MFCC features and the pruning of the non-discriminant ones. To makesuch strategy scalable to the 30M of MFCC features extracted from thetens of thousands audio recordings of the training set, we used highdimensionalhashing techniques coupled with an ecient approximatenearest neighbors search algorithm with controlled quality. Further improvementsare obtained by (i) using a sliding classier with max pooling(ii) weighting the query features according to their semantic coherence(iii) making use of the metadata to lter incoherent species. Results showthe eectiveness of the proposed technique which ranked 3rd among the10 participating groups

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD

LifeCLEF Plant Identification Task 2015

Author: Bonnet Pierre
Goëau Hervé
Joly Alexis
Publication venue: HAL CCSD
Publication date: 08/09/2015
Field of study

INRIA a CCSD electronic archive server

HAL-IRD

HAL-CIRAD

Plant Identification in an Open-world (LifeCLEF 2016)

Author: Bonnet Pierre
Goëau Hervé
Joly Alexis
Publication venue: HAL CCSD
Publication date: 05/09/2016
Field of study

International audienceThe LifeCLEF plant identification challenge aims at evaluating plant identification methods and systems at a very large scale, close to the conditions of a real-world biodiversity monitoring scenario. The 2016-th edition was actually conducted on a set of more than 110K images illustrating 1000 plant species living in West Europe, built through a large-scale participatory sensing platform initiated in 2011 and which now involves tens of thousands of contributors. The main novelty over the previous years is that the identification task was evaluated as an open-setrecognition problem, i.e. a problem in which the recognition system has to be robust to unknown and never seen categories. Beyond the brute-force classification across the known classes of the training set, the big challenge was thus to automatically reject the false positive classification hits that are caused by the unknown classes. This overview presents more precisely the resources and assessments of the challenge, summarizes the approaches and systems employed by the participating research groups, and provides an analysis of the main outcomes

INRIA a CCSD electronic archive server

HAL-IRD

HAL-CIRAD

Large-scale Content-based Visual Information Retrieval

Author: Joly Alexis
Publication venue: HAL CCSD
Publication date: 26/05/2015
Field of study

Rather than restricting search to the use of metadata, content-based information retrieval methods attempt to index, search and browse digital objects by means of signatures or features describing their actual content. Such methods have been intensively studied in the multimedia community to allow managing the massive amount of raw multimedia documents created every day (e.g. video will account to 84% of U.S. internet traffic by 2018). Recent years have consequently witnessed a consistent growth of content-aware and multi-modal search engines deployed on massive multimedia data. Popular multimedia search applications such as Google images, Youtube, Shazam, Tineye or MusicID clearly demonstrated that the first generation of large-scale audio-visual search technologies is now mature enough to be deployed on real-world big data. All these successful applications did greatly benefit from 15 years of research on multimedia analysis and efficient content-based indexing techniques. Yet the maturity reached by the first generation of content-based search engines does not preclude an intensive research activity in the field. There is actually still a lot of hard problems to be solved before we can retrieve any information in images or sounds as easily as we do in text documents. Content-based search methods actually have to reach a finer understanding of the contents as well as a higher semantic level. This requires modeling the raw signals by more and more complex and numerous features, so that the algorithms for analyzing, indexing and searching such features have to evolve accordingly. This thesis describes several of my works related to large-scale content-based information retrieval. The different contributions are presented in a bottom-up fashion reflecting a typical three-tier software architecture of an end-to-end multimedia information retrieval system. The lowest layer is only concerned with managing, indexing and searching large sets of high-dimensional feature vectors, whatever their origin or role in the upper levels (visual or audio features, global or part-based descriptions, low or high semantic level, etc. ). The middle layer rather works at the document level and is in charge of analyzing, indexing and searching collections of documents. It typically extracts and embeds the low-level features, implements the querying mechanisms and post-processes the results returned by the lower layer. The upper layer works at the applicative level and is in charge of providing useful and interactive functionalities to the end-user. It typically implements the front-end of the search application, the crawler and the orchestration of the different indexing and search services

Thèses en Ligne

INRIA a CCSD electronic archive server

Recommendations about benchmarking campaigns as a tool to foster multimedia search technology transfer at the European level

Author: Joly Alexis
Publication venue: HAL CCSD
Publication date: 07/07/2012
Field of study

INRIA a CCSD electronic archive server

Improve learning combining crowdsourced labels by weighting Areas Under the Margin

Author: Charlier Benjamin
Joly Alexis
Lefort Tanguy
Salmon Joseph
Publication venue
Publication date: 30/09/2022
Field of study

In supervised learning -- for instance in image classification -- modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per worker trust score. Yet, such worker-centric approaches discard each task ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful for the learning step. In a standard supervised learning setting -- with one label per task and balanced classes -- the Area Under the Margin (AUM) statistic is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements with respect to feature-blind aggregation strategies both for simulated settings and for the CIFAR-10H crowdsourced dataset

arXiv.org e-Print Archive

A two-head loss function for deep Average-K classification

Author: Garcin Camille
Joly Alexis
Salmon Joseph
Servajean Maximilien
Publication venue
Publication date: 31/03/2023
Field of study

Average-K classification is an alternative to top-K classification in which the number of labels returned varies with the ambiguity of the input image but must average to K over all the samples. A simple method to solve this task is to threshold the softmax output of a model trained with the cross-entropy loss. This approach is theoretically proven to be asymptotically consistent, but it is not guaranteed to be optimal for a finite set of samples. In this paper, we propose a new loss function based on a multi-label classification head in addition to the classical softmax. This second head is trained using pseudo-labels generated by thresholding the softmax head while guaranteeing that K classes are returned on average. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes. Experiments on two datasets from the literature demonstrate that our approach outperforms the softmax baseline, as well as several other loss functions more generally designed for weakly supervised multi-label classification. The gains are larger the higher the uncertainty, especially for classes with few samples

arXiv.org e-Print Archive

Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms - Best Method Of GeoLifeCLEF 2019 Challenge

Author: Deneu Benjamin
Joly Alexis
Negri Mathilde
Servajean Maximilien
Publication venue: HAL CCSD
Publication date: 09/09/2019
Field of study

International audienceThis technical report describes the model that achieved the best performance of the GeoLifeCLEF challenge, the objective of which was to evaluate methods for plant species prediction based on their geographical location. Our method is based on an adaptation of the Inception v3 architecture initially dedicated to the classification of RGB images. We modified the input layer of this architecture so as to process the spatialized environmental tensors as images with 77 distinct channels. Using this architecture, we did train several models that mainly differed in the used training data and in the predicted output classes. One of the main objective, in particular, was to compare the performance of a model trained with plant occurrences only to that obtained with a model trained on all available occurrences, including the species of other kingdoms. Our results show that the global model performs consistently better than the plant-specific model. This suggests that the convolutional neural network is able to capture some inter-dependencies among all species and that this information significantly improves the generalisation capacity of the model for any species

INRIA a CCSD electronic archive server

Hal-Diderot